Missing values in data analysis : Ignore or Impute ?
نویسندگان
چکیده
Objective: Missing values is commonly encountered in data analysis in all types of research. Various methods were introduced to handle this matter. This study aims to compare the result of using complete data analysis, missing indicator method, means substitution and single imputation in dealing with this issue. Methods: 202 patients who were discharged from the psychiatric ward, University Malaya Medical Centre (UMMC) from 27 August 2007 to 15 April 2008 were recruited. The general psychopathology was measured with Brief Psychiatric Rating Scale (BPRS-24). The information on age, gender, race, marital status and psychiatric diagnosis were collected. On follow up, the patients who had early readmission (<6 months) were identified. A logistic regression model to determine early readmission based on all the variables was made. 10% (n=20) of the highest BPRS scores were deleted to simulate a missing at random (MAR) situation. Four different statistical methods were used to deal with the missing values. Results: BPRS score was significantly associated with early readmission (p<0.01) in the original complete dataset. The associations based on complete data analysis, missing indicator method and mean substitution were biased and insignificant. Single imputation gave a closest significant estimate of the association (p<0.1). Conclusion: Ignoring missing values will result in biased estimate in data analysis. Single imputation produced unbiased estimate of association in MAR situation.
منابع مشابه
A New Algorithm to Impute the Missing Values in the Multivariate Case
There are several methods to make inferences about the parameters of the sampling distribution when we encounter the missing values and the censored data. In this paper, through the order statistics and the projection theorem, a novel algorithm is proposed to impute the missing values in the multivariate case. Then, the performance of this method is investigated through the simulation studies. ...
متن کاملCF-GeNe: Fuzzy Framework for Robust Gene Regulatory Network Inference
Most Gene Regulatory Network (GRN) studies ignore the impact of the noisy nature of gene expression data despite its significant influence upon inferred results. This paper presents an innovative Collateral-Fuzzy Gene Regulatory Network Reconstruction (CF-GeNe) framework for Gene Regulatory Network (GRN) inference. The approach uses the Collateral Missing Value Estimation (CMVE) algorithm as it...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملMissing Value Estimation of Epistatic Miniarray Profiling Data by Kernel Pca Regression Ensemble Approach
Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great success on dealing with missing values in data sets with heterogeneous attributes (their independent attributes are of different types) referred to as imputing mixed-attribute data sets. Epistatic miniarray profiling (E-MAP) is a powerful tool for analyzing gene functions a...
متن کاملAuthor's response to reviews Title: Identifying Significant Genetic Regulatory Networks in the Prostate Cancer from Microarray Data Based on Transcription Factor Analysis and Conditional Independency Authors:
We appreciate the comments from the associate editor and all reviewers in this paper. We have made necessary experiments and modifications to cope with all the comments. The basic notations for different fonts are: Bold face fonts are from reviewers' original comments. Italic face fonts are the modified/added texts in the paper. Plain fonts are our answers to reviewers' comments. Q1: There is a...
متن کاملA new imputation method for small software project data sets
Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are often contained in these data sets and this makes prediction more difficult. One common practice is to ignore the cases with missing data, but this makes the originally small software project database even smaller and can ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011